10.3 Probability Distribution Functions: Bell-Shaped
Probability Concepts
There are several ways that a probability can be assigned or calculated for an event. One method is to have knowledge about the conditions of the event and all possible outcomes and thus be able to calculate the probability. For example, rolling a single die will have one of six possible outcomes—values 1 through 6. If the die is balanced normally, then we know that each side has an equal chance of occurring, so the probability of throwing any number, such as 4, is equal to 1/6 or 16.6667 percent. Likewise, throwing two dice, we can calculate the probability of each die and the number of ways each number (from 2 to 12) can show up. From that we can calculate the probability of rolling a number as the number of combinations to get that number divided by the total combinations. For example, the probability of rolling a 12 is 1/36 = 2.78%.
Another approach is to by running many trials and counting the frequency of the results. This might be called the Relative Frequency method. So, in our example, suppose the dice are not balanced but are weighted. Then the probabilities would not be equal. One way to determine how these unbalanced dice will land is to run many trials, say 1,000 or 10,000, and see which numbers occur at what frequency. Then we can calculate probabilities of each value based on the frequency.
A final approach might be based on expert opinion. Many times, people who have extensive knowledge about a subject will predict a future outcome with a certain likelihood or probability. For example, some experts in the economy will predict that there is an 80% chance that the economy will grow by 3% in the coming year. This approach is much more subjective and prone to mistaken forecasts.
Population and Samples
Before we look at specific probability distributions, let’s first note the difference between a population and a sample. A population consists of all items of a particular group or category. For example, the population of “adults who are registered to vote” consists of everyone who is of voting age and has registered. Another population might be all the students at a university. This population of students will have some properties that will have a mean and a standard deviation. Some of these interesting properties might be their grade point average, their annual income, or the number of hours they study each week. Some of these properties of the entire population can be calculated. Since we have the data for every student’s GPA, we can calculate the mean and standard deviation of the population.
However, other properties cannot be easily measured, so analysts take a sample of the population. A sample is a subset of the population that can be used to calculate a mean and a standard deviation. The purpose of a sample is to obtain sufficient data on a property so that a valid estimate of the shape of the distribution along with the mean and standard deviation of the population can be inferred. Of course, one of the major challenges is to obtain a sample that is large enough and that accurately represents the population so that valid inferences can be made.
In books and literature on statistics, there are several common symbols used to represent the various statistical values:
-
x̄ or x-bar — This is the sample average or the sample mean.
-
µ — This is the population mean (mu).
-
s — This is the sample standard deviation.
-
σ — This is the population standard deviation (sigma).
In the case of the property of “hours spent studying each week,” a sample can be obtained by interviewing a certain number of students. This sample can be used to infer the “hours spent studying each week” of the entire student body population. As we will see later, there are also tests to verify that our sample is large enough and close enough to the population to make a valid inference.
Discrete versus Continuous
As was noted earlier, numeric data types enable us to calculate values such as mean and median. Many types of experiments naturally have numerical outcomes. Such tests include the results of the roll of a pair of dice, or time between eruptions of Old Faithful, or ACT test scores for students graduating from a certain high school. These values, referred to as random variables, can be either discrete or continuous. Continuous probabilities mean that any value within a given range can occur. Discrete probabilities only allow specific values to occur. For example, rolling the dice only yields discrete values from 2 to 12. However, the time between eruptions of Old Faithful is continuous and not bound to specific values. We note that experiments that yield integer values are discrete, while experiments that yield real numbers are continuous.
A probability density function can be plotted on a graph to show the relative probability of a certain value. The values on the left axis of a probability density function range from zero to the maximum value of a single event.
A cumulative probability function refers to the probability that a random variable is less than or equal to a specific value. If we measure individual probabilities as a number between 0 and 1, then the cumulative probability of a variable is the total area under the probability curve from 0 probability to that value of probability.
Figure 10.10 illustrates a normal distribution curve on the left and its cumulative distribution function on the right. The red curve is the standard normal distribution with a mean of 0.0 and a standard deviation of 1.0. The vertical axis on each chart is a measure of the probability and goes from 0.0 to 1.0.1 For the standard normal distribution function (red), the probability of sampling a zero value is 0.399, as seen on the chart on the left. The probability of sampling a zero or less is 0.5, as seen on the cumulative probability density on the right chart.
Figure 10.11 illustrates the discrete probability density function for rolling a pair of dice. Only integer values between 2 and 12 are possible. Again, the probability values are shown on the left axis. For example, the probability of rolling a 7 is 16.67%, but the probability of rolling a 12 is only 2.78%.
Common Sampling Distributions
How accurate are the measures of central tendency and variability described above for a set of data? Statisticians, over many years, built sampling distributions (which were extensively studied and have been codified as part of traditional statistics) for different sample statistics in different situations. For example, there is the Student’s t-distribution, the F-distribution, and the chi-square formula distribution.
A sampling distribution shows every possible result a statistic can take in every possible sample from a population and how often each result happens. It is thus the probability distribution of the values for a statistic. Before electronic computers, these sampling distributions simplified hypothesis testing and confidence intervals (statistical inference) by allowing us to quantify potential error in an estimate that might be due to random variation.
These sampling distributions were published in tables, in which the probability of a statistic could be looked up and compared to the current problem to see how likely random variation played a role in the outcome. We will cover the normal, Student’s t, binomial, Poisson, and exponential distributions. This discussion will help to know which distribution to use to model a given situation.
In this section, we will present those distributions that have a “bell” shape. The most common bell-shaped distribution is the normal distribution. But we will see there are several other distributions that also are bell-shaped and where the central tendency characteristic is strong. In the following section, we will present other distributions that do not follow the central tendency so strongly.
Normal Distribution
In several of the previous figures, we have illustrated various versions of a normal distribution. Exactly what is a normal distribution? A normal distribution is a bell-shaped graph that describes a symmetrical arrangement of data. The two basic measures for a normal distribution are its mean or average, which is the center point of the graph, and its standard deviation, which describes how the data is dispersed around the mean. For a normal distribution, 68% of the observations are within +/- one standard deviation of the mean, 95% are within +/- two standard deviations, and 99.7% are within +/- three standard deviations. Figure 10.12 illustrates a normal curve with standard deviations marked.
Use of the normal distribution is motivated by several factors. First, it approximates many natural phenomena. For example, many types of demographic data naturally form a normal distribution. Such data as people’s heights, IQ scores, income levels in the country, shoe sizes, birth weights, SAT scores and other test scores, plus many other demographic data are all distributed in a bell curve.
The other major motivation is called the Central Limit Theorem. This theorem states that average values calculated from independently distributed random variables have approximately normal distributions. This is true regardless of the type of distribution from which the original variables are sampled. This theorem states that when various independent factors combine together to influence a particular set of data, the resulting dataset forms a bell-shaped curve. This curve is called a Gaussian distribution and is another name for the normal distribution.
Excel has two functions that work with the normal curve.
-
NORM.DIST (x, mean, standard_deviation, Cumulative) — If Cumulative is FALSE, it returns the probability of the number x occurring in a normal curve with mean and standard_deviation. If Cumulative is TRUE, it returns the cumulative probability of x. In other words, the value of x is a number related to the mean, and the function returns the probability of that value, such as .34320. It will return either the specific value from the distribution curve (FALSE), or the value from the cumulative distribution curve (TRUE).
-
NORM.INV (probability, mean, standard_deviation) — Returns the inverse of the NORM.DIST function. In other words, it returns the x value from the NORM.DIST function where the probability is the cumulative probability (TRUE). For example, for a probability of .9 on the standard normal distribution, it returns +1.281552.
A normal distribution can be broad and flat or narrow and peaked. Sometimes it is beneficial to compare the normal curves of different datasets. One way to compare values is to calculate a z-score for a particular point on the curve. The z-score is a measure of how close or how far a value is from the mean in terms of the standard deviation. The z-score in essence converts any normal distribution curve to the standard normal distribution. The formula to calculate the z-score is:
The answer is a number as compared to the standard deviation. A negative number is to the left of the mean, and a positive number is to the right. A number less than 1 means less than one standard deviation away from the mean. For example, a z-score of 1.5 for a value such as 75 indicates that the number 75 is 1.5 times the standard deviation greater than the mean.
Using the Normal Distribution
As an example of using the normal distribution, let’s suppose a teacher created a new final exam. After the students completed the exam, he calculated the mean to be 68 with a standard deviation of 15. He would like the highest 10% of the students to receive an A grade. Since test scores in populations are frequently distributed normally, the teacher is going to assume the class scores also follow a normal distribution. Figure 10.13 illustrates the problem—to find the test score that gives an A to 10% of the students.
The easiest way to find the answer is to use the NORM.INV function in Excel. We know the probability point is at 90%. In other words, we want to find the point where the cumulative probability is 90%. So we simply enter the parameters in the Excel function:
For another way to answer this question, we will use a table of z-scores to find that point on a standard normal distribution and then use the z-score equation to convert that z-score to our answer value. We will use the variable "x" as the unknown value.
As can be seen from the formula above, z-scores for the left side of a normal curve are negative, and z-scores for the right half are positive. Standardized z-scores are often given in tables, which can easily be found on the internet.2 In our example, we want to find the z-score that splits the distribution curve into 90% and 10% values. Figure 10.14 gives a z-score table with the value of .89973 as the closest value to the desired 90%. We will accept that value as close enough accuracy for our purposes. Reading the row and column for this value, we see that it corresponds to a z-score of 1.28. (Note: This answer seems reasonable. Referring to Figure 10.12, a 90% value will be between one and two standard deviation values.)
Next we convert our z-score to the test score using the above equation. Rearranging the equation to solve for x yields the following result.
Thus, students earning above 87.2 on the exam will get an A grade. This same process can be used to give a certain percentage for B grades and so on. The calculations will be a little more complex when going across the midpoint mean because z-scores are negative for values below the mean.
Student’s t-Distribution
The t-distribution (also called Student’s t-distribution) is a family of distributions that look almost identical to the normal distribution curve, only a bit shorter and fatter. The t-distribution is used instead of the normal distribution when you have small samples. The larger the sample size, the more the t-distribution looks like the normal distribution. In fact, for sample sizes larger than 20 (e.g., more degrees of freedom), the distribution is almost exactly like the normal distribution. The t-distribution is symmetric and bell-shaped, like the normal distribution. However, the t-distribution has heavier tails, meaning that it is more prone to producing values that fall far from its mean.
Figure 10.15 shows various cases for the standard t-distribution. The degrees of freedom are equal to n-1, where n is the sample size. As can be seen in the figure, as the degrees of freedom (df = n-1) get larger, the t-distribution curve approaches the normal curve. The t-distribution is often used as a reference for the distribution of a sample mean and the difference between two sample means.3
Bernoulli and Binomial Distributions
A Bernoulli distribution is a discrete distribution with two possible outcomes. We often label these outcomes as 1 for success and 0 for failure. A simple example would be the tossing of a coin. In a coin toss, the probability of heads(1) is 50% and the probability for tails(0) is also 50%. Assuming a balanced coin, the probability of either event is 50%.
The probabilities do not have to be the same. Assume that our Bernoulli trial is whether a roll of a single die will give a 6 or not. So in this case the outcomes are success(1), a 6 is rolled, and failure(0), a 6 is not rolled. Success has a probability of 1/6 = 16.67%, and failure has a probability of 5/6 = 83.33%. The sum of the two probabilities adds to 100%.
Now suppose that we want to expand these experiments. For the coin toss, we want to know the probability of getting heads a certain number of times out of a number of trials. For that test, we will use the binomial distribution. Again, the binomial distribution is used for discrete outcomes, such as flipping a coin, rolling a dice, or pass/fail on an exam.
A binomial distribution models the number of successes (frequency) in a number of binomial trials with a specified probability of success. A binomial trial is an event with two discrete outcomes (like a coin flip). With a large number of trials (when the probability is close to 0.50), the shape of the binomial distribution becomes so close to the normal distribution that it can be approximated by the normal distribution. Figure 10.16 illustrates the Binomial Probability Mass Function (PMF) compared with the Normal Probability Distribution Function (PDF).
Excel has two functions that work with the binomial distribution:
-
BINOM.DIST (number_s, trials, probability_s, Cumulative) — Returns the probability for the number of successes, in the number of trials, with a probability_s success rate for each trial (between 0 and 1). It returns the probability if Cumulative is FALSE, and returns the cumulative probability if Cumulative is TRUE.
-
BINOM.INV (trials, probability_x, alpha) — This returns the number of successes in the number of trials with the probability_s of success for each trial (between 0 and 1), where alpha is the cumulative probability value (between 0 and 1).
Figure 10.17 illustrates the use of BINOM.DIST for a coin toss where the probability of success for each trial is 0.50. In this case the number of trials is 10, and the number of successes is 0 through 10, as shown in the table. The chart shows both probabilities, the event probability and the cumulative probability.
The next figure is a similar event, but represents the probabilities for rolling a die. In this case the probability of success is 0.1667, which represents the probability of rolling a 6 on the die for each trial. Again, we use 10 trials. Figure 10.18 shows the table of 0 through 10 successes with the associated event probability and cumulative probability. Notice how the probability of success rapidly reduces for values over 4 successes.